source("utils.R")
theme_set(theme_minimal())Irregular Lattices
Dependencies
For this representation of the cells we will rely on the SpatialFeatureExperiment package. For preprocessing of the dataset we refer the reader to the vignette of the voyager package.
(sfe <- HeNSCLCData())class: SpatialFeatureExperiment
dim: 980 100290
metadata(0):
assays(1): counts
rownames(980): AATK ABL1 ... NegPrb22 NegPrb23
rowData names(3): means vars cv2
colnames(100290): 1_1 1_2 ... 30_4759 30_4760
colData names(17): Area AspectRatio ... nCounts nGenes
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : CenterX_global_px CenterY_global_px
imgData names(1): sample_id
unit: full_res_image_pixels
Geometries:
colGeometries: centroids (POINT), cellSeg (POLYGON)
Graphs:
sample01:
# Empty cells
colData(sfe)$is_empty <- colData(sfe)$nCounts < 1
# Select negative control probes
neg_inds <- str_detect(rownames(sfe), "^NegPrb")
# Number of negative control probes
sum(neg_inds)[1] 20
colData(sfe)$prop_neg <- colSums(counts(sfe)[neg_inds,])/colData(sfe)$nCounts
# Remove low quality cells
(sfe <- sfe[,!sfe$is_empty & sfe$prop_neg < 0.1])class: SpatialFeatureExperiment
dim: 980 100095
metadata(0):
assays(1): counts
rownames(980): AATK ABL1 ... NegPrb22 NegPrb23
rowData names(3): means vars cv2
colnames(100095): 1_1 1_2 ... 30_4759 30_4760
colData names(19): Area AspectRatio ... is_empty prop_neg
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : CenterX_global_px CenterY_global_px
imgData names(1): sample_id
unit: full_res_image_pixels
Geometries:
colGeometries: centroids (POINT), cellSeg (POLYGON)
Graphs:
sample01:
# Calculate count stats
rowData(sfe)$means <- rowMeans(counts(sfe))
rowData(sfe)$vars <- rowVars(counts(sfe))
rowData(sfe)$is_neg <- neg_inds
# log Counts
sfe <- logNormCounts(sfe)In this vignette we will show the metrics related to two marker genes, KRT17 which is a marker of basal cells and TAGLN which is a marker of smooth muscle cells.
plotSpatialFeature(sfe, c("KRT17"),
colGeometryName = "centroids", ncol = 2, scattermore = TRUE) +
theme_void()plotSpatialFeature(sfe, c("TAGLN"),
colGeometryName = "centroids", ncol = 2, scattermore = TRUE) +
theme_void()colGraph(sfe, "knn5") <- findSpatialNeighbors(sfe, method = "knearneigh",
dist_type = "idw", k = 5,
style = "W")
weights_neighbourhoods <- colGraph(sfe, "knn5")Local Measures for Bivariate Data
Bivariate Lee’s L
Lee’s is a bivariate measure that combines non-spatial Pearson Correlation with spatial autocorrelation via Moran’s I. This enables us to asses the spatial dependence of two variables in a single measure. The measure is defined as:
\[L(x,y) = \frac{n}{\sum_{i=1}^n(\sum_{j=1}^nw_{ij})^2}\frac{\sum_{i=1}^n(\sum_{j=1}^nw_{ij}(x_i-\bar{x}))(\sum_{j=1}^nw_{ij}(y_j-\bar{y}))}{\sqrt{\sum_{i=1}^nw_{ij}(x_i-\bar{x})^2}\sqrt{\sum_{i=1}^nw_{ij}(y_i-\bar{y})^2}}\] with \(w_{ij}\) the spatial weights matrix, \(x\) and \(y\) the two variables and \(\bar{x}\) and \(\bar{y}\) their means.
Implementation using voyager
sfe <- runBivariate(sfe, type = "locallee",
feature1 = "KRT17", feature2 = "TAGLN")
plotLocalResult(sfe, "locallee", features = localResultFeatures(sfe, "locallee"),
ncol = 2, colGeometryName = "centroids")Implementation using spdep
loc <- lee(x = logcounts(sfe)["KRT17",],
y = logcounts(sfe)["TAGLN",],
n = length(logcounts(sfe)["TAGLN",]),
listw = weights_neighbourhoods)
#convert into a plain sf object for plotting
sf <- colGeometries(sfe)$cellSeg
sf$locEffect <- loc$localL
tm_shape(sf) + tm_fill(col = 'locEffect') Local Measures for Multivariate Data
Multivariate local Geary’s C
Geary’s C is a measure of spatial autocorrelation that is based on the difference between a variable and its neighbours. It is defined as
\[C_i = \sum_{j=1}^n w_{ij}(z_i-z_j)^2\]
and can be generalized to \(k\) parameters by expanding
\[c_{k,i} = \sum_{v=1}^k c_{v,i}\]
where \(c_{v,i}\) is the local Geary’s C for the \(v\)th variable at location \(i\). Compared to bivariates Lee it means that we use more than two variables.
We will use highly two marker genes of the dataset to calculate the multi-variate Geary’s C.
hvgs <- getTopHVGs(sfe, fdr.threshold = 0.01)
sfe <- runMultivariate(sfe, type = "localC_multi",
subset_row = c("KRT17", "TAGLN"),
dest = "colData")
# plotLocalResult(sfe, "localC_multi", features = c("KRT17"), ncol = 2,
# colGeometryName = "centroids")
# Local C mutli is stored in colData so this is a workaround to plot it
plotSpatialFeature(sfe,"localC_multi", colGeometryName = "centroids")The plot indicates regions where the gene expression is more similar homogeneous (low Geary’s C) and regions where the gene expression is more dissimilar heterogeneous (large Geary’s C value). Importantly, strong similarity in one or some variables may compensate for disimilarity in other variables. The local Geary’s C value is not scaled.
Note that the computation is very cost intensive.
sfe <- runMultivariate(sfe, type = "localC_perm_multi",
subset_row = c("KRT17", "TAGLN"),
dest = "colData")
# Local C mutli is stored in colData so this is a workaround to plot it
plotSpatialFeature(sfe,"localC_perm_multi", colGeometryName = "centroids")We can further plot the results of the permutation test. This can indicate interesting regions, but should not interpreted with care for different reasons. As we are looking for similarity in a combination of multiple values and the exact combination is not known. For further details see Anselin (2019).
loc <- localC_perm(x = t(as.matrix(logcounts(sfe)[c("KRT17", "TAGLN"),])),
listw = weights_neighbourhoods, nsim = 1)
#convert into a plain sf object for plotting
sf <- colGeometries(sfe)$cellSeg
sf$locEffect <- loc
tm_shape(sf) + tm_fill(col = 'locEffect') It would be interesting to do this not on the cell level but on the domain level.
Local Neighbor Match Test
This test is useful to assess the overlap of the k-nearest neighbour in the tissue space with the k-nearest neighbours in the attribute space.
sf <- colGeometries(sfe)$cellSeg
sf <- cbind(sf, t(as.matrix(logcounts(sfe)[c(hvgs),])))
nbr_test <- neighbor_match_test(sf[c(hvgs)], k = 20)
sf$Probability <- nbr_test$Probability
sf$Cardinality <- nbr_test$Cardinality
tm_shape(sf) + tm_fill(col = 'Probability') tm_shape(sf) + tm_fill(col = 'Cardinality') Cardinality is a measure of how many neighbours of each cell are in common. Some regions show high crardinality with low probability. On the cellular level, this might no be very informative and lead to regions with high similarity. We only see very few cells with a cardinality greater than 0. The problem might come from the imperfect segmentation of the cells, which results in a poor reconstruction of the “geographical” tissue neighbourhood graph. In addition, as the number of dimensions increase, the empty space between observations also increases. This is known as the empty space problem.
Again this analysis should probably be performed on a structure level instead of a single cell level.
References
Anselin, Luc. 1995. “Local Indicators of Spatial Association—LISA.” Geogr. Anal. 27 (2): 93–115. ———. 2019. “A Local Indicator of Multivariate Spatial Association: Extending Geary’s c.” Geogr. Anal. 51 (2): 133–50.
Appendix
Session info
sessionInfo()R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Zurich
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] stringr_1.5.0 dixon_0.0-8
[3] splancs_2.01-44 spdep_1.2-8
[5] spData_2.3.0 tmap_3.3-4
[7] scater_1.28.0 scran_1.28.2
[9] scuttle_1.10.3 SFEData_1.2.0
[11] SpatialFeatureExperiment_1.2.3 Voyager_1.2.7
[13] rgeoda_0.0.10-4 digest_0.6.33
[15] ncf_1.3-2 sf_1.0-14
[17] reshape2_1.4.4 patchwork_1.1.3
[19] STexampleData_1.8.0 ExperimentHub_2.8.1
[21] AnnotationHub_3.8.0 BiocFileCache_2.8.0
[23] dbplyr_2.3.4 RANN_2.6.1
[25] seg_0.5-7 sp_2.1-1
[27] rlang_1.1.1 ggplot2_3.4.4
[29] dplyr_1.1.3 mixR_0.2.0
[31] spatstat_3.0-6 spatstat.linnet_3.1-1
[33] spatstat.model_3.2-6 rpart_4.1.19
[35] spatstat.explore_3.2-3 nlme_3.1-162
[37] spatstat.random_3.1-6 spatstat.geom_3.2-5
[39] spatstat.data_3.0-1 SpatialExperiment_1.10.0
[41] SingleCellExperiment_1.22.0 SummarizedExperiment_1.30.2
[43] Biobase_2.60.0 GenomicRanges_1.52.1
[45] GenomeInfoDb_1.36.4 IRanges_2.34.1
[47] S4Vectors_0.38.2 BiocGenerics_0.46.0
[49] MatrixGenerics_1.12.3 matrixStats_1.0.0
loaded via a namespace (and not attached):
[1] spatstat.sparse_3.0-2 bitops_1.0-7
[3] httr_1.4.7 RColorBrewer_1.1-3
[5] tools_4.3.1 utf8_1.2.3
[7] R6_2.5.1 HDF5Array_1.28.1
[9] mgcv_1.8-42 rhdf5filters_1.12.1
[11] withr_2.5.1 gridExtra_2.3
[13] leaflet_2.2.0 leafem_0.2.3
[15] cli_3.6.1 labeling_0.4.3
[17] proxy_0.4-27 R.utils_2.12.2
[19] dichromat_2.0-0.1 scico_1.5.0
[21] limma_3.56.2 rstudioapi_0.15.0
[23] RSQLite_2.3.1 generics_0.1.3
[25] crosstalk_1.2.0 Matrix_1.5-4.1
[27] ggbeeswarm_0.7.2 fansi_1.0.5
[29] abind_1.4-5 R.methodsS3_1.8.2
[31] terra_1.7-55 lifecycle_1.0.3
[33] yaml_2.3.7 edgeR_3.42.4
[35] rhdf5_2.44.0 tmaptools_3.1-1
[37] grid_4.3.1 blob_1.2.4
[39] promises_1.2.1 dqrng_0.3.1
[41] crayon_1.5.2 lattice_0.21-8
[43] beachmat_2.16.0 KEGGREST_1.40.1
[45] magick_2.8.0 pillar_1.9.0
[47] knitr_1.44 metapod_1.7.0
[49] rjson_0.2.21 boot_1.3-28.1
[51] codetools_0.2-19 wk_0.8.0
[53] glue_1.6.2 vctrs_0.6.4
[55] png_0.1-8 gtable_0.3.4
[57] cachem_1.0.8 xfun_0.40
[59] S4Arrays_1.0.6 mime_0.12
[61] DropletUtils_1.20.0 units_0.8-4
[63] statmod_1.5.0 bluster_1.10.0
[65] interactiveDisplayBase_1.38.0 ellipsis_0.3.2
[67] bit64_4.0.5 filelock_1.0.2
[69] irlba_2.3.5.1 vipor_0.4.5
[71] KernSmooth_2.23-21 colorspace_2.1-0
[73] DBI_1.1.3 raster_3.6-26
[75] tidyselect_1.2.0 bit_4.0.5
[77] compiler_4.3.1 curl_5.1.0
[79] BiocNeighbors_1.18.0 DelayedArray_0.26.7
[81] scales_1.2.1 classInt_0.4-10
[83] rappdirs_0.3.3 goftest_1.2-3
[85] spatstat.utils_3.0-3 rmarkdown_2.25
[87] XVector_0.40.0 htmltools_0.5.6.1
[89] pkgconfig_2.0.3 base64enc_0.1-3
[91] sparseMatrixStats_1.12.2 fastmap_1.1.1
[93] htmlwidgets_1.6.2 shiny_1.7.5.1
[95] DelayedMatrixStats_1.22.6 farver_2.1.1
[97] jsonlite_1.8.7 BiocParallel_1.34.2
[99] R.oo_1.25.0 BiocSingular_1.16.0
[101] RCurl_1.98-1.12 magrittr_2.0.3
[103] GenomeInfoDbData_1.2.10 s2_1.1.4
[105] Rhdf5lib_1.22.1 munsell_0.5.0
[107] Rcpp_1.0.11 ggnewscale_0.4.9
[109] viridis_0.6.4 stringi_1.7.12
[111] leafsync_0.1.0 zlibbioc_1.46.0
[113] plyr_1.8.9 parallel_4.3.1
[115] ggrepel_0.9.4 deldir_1.0-9
[117] Biostrings_2.68.1 stars_0.6-4
[119] splines_4.3.1 tensor_1.5
[121] locfit_1.5-9.8 igraph_1.5.1
[123] ScaledMatrix_1.8.1 BiocVersion_3.17.1
[125] XML_3.99-0.14 evaluate_0.22
[127] BiocManager_1.30.22 httpuv_1.6.11
[129] purrr_1.0.2 polyclip_1.10-6
[131] scattermore_1.2 rsvd_1.0.5
[133] lwgeom_0.2-13 xtable_1.8-4
[135] e1071_1.7-13 RSpectra_0.16-1
[137] later_1.3.1 viridisLite_0.4.2
[139] class_7.3-22 tibble_3.2.1
[141] memoise_2.0.1 beeswarm_0.4.0
[143] AnnotationDbi_1.62.2 cluster_2.1.4